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Abstract. Channel capacity describes the size of the nearly ideal channels, 
which can be obtained from many uses of a given channel, using an optimal 
error correcting code. In this paper we collect and compare minor and major 
variations in the mathematically precise statements of this idea which have been 
put forward in the literature. We show that ali the variations considered lead to 
equivalent capacity definitions. In particular, it makes no difference whether one 
requires mean or maximal crrors to go to zero, and it makes no difference whether 
errors are required to vanish for any sequence of block sizes compatible with the 
rate, or only for one infinite sequence. 
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1. Introduction, Outline and Notations 

Quantum channel capacity is one of the key quantitative notions of the young field of 
quantum information theory. Whenever one asks "how much quantum information" 
can be stored in a device, or sent on a transmission line, it is implicitly a question 
about the capacity of a channel. Like Shannon's classical dcfinition, the concept 
applics also to noisy channels, which do corrupt the signal. In this case one may 
apply an error correction scheme, and stili use the channel like an almost ideal one. 
Capacity expresses this quantitatively: it is the maximal number of ideal qubit (resp. 
bit) transmissions per use of the channel, taken in the limit of long messages and using 
error correction schemes asymptotically eliminating ali errors. 

Many of the terms in this informai definition can be, and in fact have been, 
formalizcd mathematically in different ways. As a result, there are many publishcd 
definitions of quantum capacity in the literature. Some of these are immediately seen 
to be equivalent, but with other variants this is less obvious. Moreover, some of 
the differences seem to have gone unnoticed, creating the danger that some results 
would be unwittingly transferred between inequivalent concepts, creating a mixture 
of rigorous argument and folklore hard to unravel. 

The purpose of the present paper is to show that, fortunately, ali the major 
definitions are indeed equivalent. In order to make the presentation self-contained, we 
have also included abridged versions of arguments from the literature. Other points, 
however, e.g., concerning the question whether the rate has to be achieved on every 
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sequence of increasing blocks, or just on an infinite, possible sparse set of increasing 
block sizes, seem to be new. We have also made an effort to lay out the required tools 
carefully, so that they can be used in other applications. 

Ali this does not help mudi to come closer to the proof of a coding theorem, Le., 
to a rigorous formula for the capacity not requiring the solution of asymptotically large 
optimization problems. Major progress in this direction has recently been obtained by 
Shor PEI and Devetak 0. We hope that our work will contribute to an unambiguous 
interpretation of these results, as well. 

The key chapter (Scction[2J) of this paper begins with presenting the theme: a basic 
rigorous definition of quantum channel capacity. This is followed by nine logicai vari- 
ations on this theme, which like musical variations are not ali of the same weight. In 
each variation a result is stated to the effect that a modified definition is equivalent to 
the basic one after ali. Ali proofs, however, are left to the later sections. A Coda at 
the end of the variations comments on the coding theorem and recent developments. 

2. Tema: Quantum Channel Capacity 

2.1. Choice of units 

2.2. Testing only one sequence 

2.3. Minimum fidelity 

2.4. Average fidelity 

2.5. Entanglement fidelity 

2.6. Entropy rate 

2.7. Errors vanishing quickly or not at ali 

2.8. Isometric encodings and homomorphic decodings 

2.9. Little help from a classical friend 
2.10. Coda: The Coding Theorem 

Notations 

In order to state the basic definition of capacity and its variations, we have to introduce 
some notation. A quantum channel which transforms input systems described by a 
Hilbert space 7ii into output systems described by a (possibly different) Hilbert space 
H.2 is represented by a completely positive trace-preserving linear map T: B*(Hi) — > 
B*(7Ì2), where by B*(H.) we denote the space of trace class operators on H. This map 
takes the input state to the output state, i. e., we work in the Schròdinger picture (see 
Kraus' textbook 0] for a detailed description of the concept of quantum operations). 

The definition of channel capacity requires the comparison of the channel after 
correction with an ideal channel. As a measure of the distance between two channcls 
we take the norm of complete boundedness (or cb-norm, for short) jH], denoted by || • || c j,. 
For two channels T, S, the distance ^||T — S\\ c b can be defined as the largest difference 
between the overall probabilities in two statistical quantum experiments differing only 
by exchanging one use of S by one use of T. These experiments may involve entangling 
the systems on which the channels act with arbitrary further systems. Equivalently, 
we may set ||T|| c ó = sup„ ||T® id n ||ooj where the norm is the norm of linear operators 
between the Banach spaces B^iTiì), and id„ denotes the identity map (ideal channel) 
on the n x n matrices. 
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Among the properties which make the cb-norm well-suited for capacity estimates 
are multiplicativity, ||Ti (3T 2 \\ c b = ||7\|| C 6 H^allcA) an d unitality, ||T'|| C 6 = 1 for any 
channel T. The equivalence with other error measures is discusseci extensively below. 

Note that throughout this work we use base two logarithms, and we write 
Ida; := log 2 x. 



2. Tema: Quantum Channel Capacity 

Deflnition 2.1 A positive number R is called achievable rate for the quantum 
channel T:2?*(7ii) ~* B*(TÌ2) with respect to the quantum channel S: B.^{TL^) — > 
B^iHi) iff for any pair of integer sequences (?v)i/gn and (m I/ )u^'N with 'tì.m. l/ - t . 00 n„ = 
oo and lim,,^,^^ < R we have 

lim A(n v ,m v ) = 0, (1) 

v — >oo 

where we set 

A(n v ,m„) := ini \\DT® n »E - S® m " \\ cb , (2) 

D.E 

the infimum taken over ali encoding channels E and decoding channels D with suitable 
domain and range. 

The channel capacity Q(T, S) of T with respect to S is defined to he the supremum 
of ali achievable rates. The quantum capacity is the special case Q(T) :— Q(T, id 2), 
with id 2 being the ideal qubit channel. 

This dcfinition is a transcription of Claude E. Shannon's deflnition of the capacity of a 
discrete memoryless channel in classical information theory, as presented originally in 
his famous 1948 paper [H] and now found in most standard textbooks on the subject 
(e.g., 0ED- To make the translation one only needs to express Shannon's maximal 
error probabilities in terms of norm estimates [Hj and take an ideal one-bit channel 
rather than the one-qubit channel as the reference. This choice can also be made for 
quantum channels T, defining the capacity C(T) of a quantum channel for classical 
information. Much more is known about C(T) than about Q(T) |10U11| . 



2.1. Prima Variazione: Choice of Units 

Formally, Deflnition 12. Il assigns a special role to the ideal qubit channel id2- Is this 
essential? What do we get if we take the ideal channel id „ on a Hilbert space of some 
dimension n > 2 as reference? 

We will show in Section 13.21 that the choice n = 2 only amounts to a choice of 
units, fixing the unit bit: 

ld m 

Q(T, id„) = — Q(T,id m ). (3) 
van 

2.2. Seconda Variazione: Testing Only One Sequence 

At first sight Deflnition 12.11 of channel capacity, as given above and widely used 
throughout the community |9I12U13I14II15| . seems a little impractical, since it involves 
checking an infinite number of pairs of sequences when testing a given rate R. Work 
would be substantially reduced if only one such pair had to be tested. For the sake 
of discussion let us say that a rate R is sporadically achievable if, for some pair 
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of sequences (n„)„ e rj, (^Ji-eN, with n u — > oo and vanishing errors, the rate R is 
achieved infinitely often: lim^^oo — = R. For example, there might be a special 
coding scheme which utilizes some rare number theoretical properties of n and m. 
Many published definitions |16I17I18I19| would accept sporadically achievable rates as 
achievable in the capacity definition. Often the choice n u = v is made |20I21I22I23I24| . 
While this sequence of block sizes can hardly be called "sporadic" , it is a logically 
similar variation to the sparse sequences, so we include it for convenience. 

In Sectiondwe show that ali sporadically achievable rates are, in fact, achievable. 
Hence there is no need to introduce a "sporadic capacity" . What we have to show is 
that coding schemes that work infinitely often can be extended to ali block sizes. This 
is a non-trivial result, since we also show that by merely putting blocks together, and 
by perhaps not using some of the code bits such an extension is not possible. 

2. 3. Terza Variazione: Minimum Fidelity 

The cb-norm is by no means the only way to evaluate the distance between two 
channcls. Anothcr distance measure that has appeared particularly widely (e.g., 
in |2SlEnilSEi) is the minimal overlap between input and corresponding output 
states: The minumum fidelity of a quantum channel T: B*(TL) — > B* (TL) is dcfincd as 

F(T) := min{(V>|T(|V>><V|)|V>> I ^6«,W = 1}. (4) 
When we want to particularly emphasize the Hilbert space TL on which the 
minimization is performed, we will write F(H,T) instead. 

Of course, < F(H,T) < 1, and F(TL,T) = 1 implies that T acts as the 
ideal channel on TL: T\-n = id-^. These features make the minimum fidelity a 
suitable distance measure. We might then cali a positive number R an achievable 
rate for the channel T if there is a sequence (/C n ) n£ N of Hilbert spaces such that 
linin^oo ld dmi(fcn) _ jj> an( j ]j JXLn ^ oo FQC n , D n T® n E n ) = 1 for suitable encodings 
(E n ) neN and decodings (D n ) neN . 

In Section 14.21 we show that the quantum channel capacity arising from this 
definition is the same. 

2. 4- Quarta Variazione: Average Fidelity 

Instead of requiring that the maximum error be small we might be less demanding, 
and just require an average error to vanish. In the previous section we would then 
have to rcplace the minimum fidelity F(T) by the average fidelity, 

F(T) := j (^T(I^I)IV) #, (5) 

where the integrai is over the normalized unitarily invariant measure ll dtj) n on the unit 
vectors in TL. 

In Sections 14.31 and 14.41 we show that this modification has no effect on the 
quantum channel capacity. An alternative proof is presented in Section 16.21 

2.5. Quinta Variazione: Entanglement Fidelity 

Entanglement fidelity was introduced by Ben Schumacher in 1996 26 and is closely 
related to minimum fidelity. It characterizes how well the entanglement between the 
input states and a reference system not undergoing the noise process is preserved: 
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For a quantum channel T: B*(H) — > B*(Ti) and a quantum state g € B*(Ti), the 
entanglement fidelìty of g with respect to T, F e (g,T), is given as 

FefoT) := (V>|T®id B , (w) (|V>)(V>l)lV>>, (6) 

where ^ is a purification of This quantity does not depend on the details of the 
purification process, as is made evident by the alternative expression |2fi| 

F e (g,T) = ]T|tr^| 2 , (7) 

i 

where T(a) — ^\ t^t* is the Kraus decomposition of T 4 . Obviously, < 
F e (g,T) < 1. Moreover, F e (g,T) — 1 implies that T is noiseless on the support 

of £>. ^Isuppff?) ìd supp(p) ■ 

We might then define achievable rates exactly as in Section 1^31 ab ove, replacing 
the condition linin^oo F(JC ni D n T® n E n ) — 1 by the requirement that 

lini inf F e (g,D n T® n E n ) = 1 (8) 

n->ca e eB,(K„) 

for suitable encodings (£ n )n6M and decodings (D n ) ne ^. 

The quantum capacity that stems from this definition of achievable rates is like- 
wise equivalent, as shown in Section FOl 

In the previous definition, instead of minimizing over ali density operators g € S*(7i), 
one can simply choose g to be the maximally mixed state on Ti, Q := \ In , with the 
shorthand d :— dim7i. The resulting variant of entanglement fidelity we cali channel 
fidelity |27) of the quantum channel T, 

F C (T) := F e àl H ,T) = (fi|(T g> id«) (1^1) |fi>, (9) 
a 

where = d -1 / 2 Yl%=i l*> *) ^ s a uiaximally entangled state on Ti ® 7i. 

Channel fidelity is a very handy figure of merit, since it is a linear functional, does 
not involve a maximization process, and is completely equivalent to the error criteria 
discussed above. The details are spelled out in Section IÌ31 

A further variant arises when in the definition of channel fidelity instead of the max- 
imally entangled state fi an arbitrary input state r G Ti <8> Ti is permitted, replacing 
the channel fidelity F C (T) by the quantity F C (T, T) := (0|(T <g> id H ) (|r)(r|) This 
is the error quantity I. Devetak's entanglement generatìng capacity |3j is built on, and 
is likewise equivalent ('Section l4.5[l . 



2. 6. Sesta Variazione: Entropy Rate 

The originai definition of quantum channel capacity in terms of entanglement fidelity 
involves a different concept of computing the rates |lfill8| . According to this definition, 
the capacity of a quantum channel is the maximal entropy rate of a quantum source 
whose entanglement with the reference system is preserved by the noisy channel. A 
quantum source (K n , g n )neN consists of a pair of sequences of Hilbert spaces IC n and 
corresponding density operators g n G B*(/C n ). It is meant to represent a stream of 
quantum particles produced by some physical process. Its entropy rate is defined as 

R = luTLn-K» -S(g n ) , (10) 
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where S(g) = — tr(gldg) is the von Neumann entropy. 

The quantum capacity as defined by Schumacher is then the supremum of ali 
entropy rates for sources such that lim^oo F e (g n , D n T® n E n ) = 1 for suitable 
encodings (E n ) neN and decodings {D n ) neN . 

It turns out that in order to make this definition equivalent to the others, some 
mild constraint on the sources is needed. In fact, we will show in Section T4.4I that 
the supremum over ali sources will be infinite for ali channels with positive capacity. 
However, for a wide range of interesting sources equivalence does hold, namely (cf. 
Section 

• if p n = 1jc„/ dim/C n , which brings us back to the definition based on channel 
fidelity discussed in the previous section, 

• if the source satisfies the so-called asymptotic equipartition property, which 
has recently been established for general stationary ergodic quantum sources 

!2HiEniEni, or 

• if the dimension of the ambient space of the encodings grows at most exponentially 
(even at a rate much larger than the capacity) . 

2. 7. Settima Variazione: Errors vanishing quickly or not at ali 

In the various definitions of achievable rates presented so far, instead of simply re- 
quiring the error quantity to approach zero in the large block limit, one could impose 
a certain minimum speed of convergence, e. g., linear, polynomial, exponential, or 
super-exponential convergence, as a function of the number of channel invocations. 
We will show in Section that ali these definitions coincide, as long as the speed of 
convergence is at most exponential. 

If we require the errors to vanish even faster or, in the extreme case, that A(n l/ , m v ) = 
for large enough as in theory of error correcting codes invented by Knill and 
Laflamme equivalence no longer holds: If a channel has a small, but non- vanishing 
probability for depolarization, the same also holds for its tensor powers, and no such 
channel allows the perfect transmission of even one qubit. Hence the capacity based 
on exactly vanishing errors will be zero for such channels. 

On the other hand, one might sometimes feel inclined to tolerate (small) finite er- 
rors in transmission: For some e > 0, let Q e {T) denote the quantity defined exactly 
like the quantum channel capacity in Definition 12. Il but requiring only A(n l/ , m v ) < e 
for large v instead of lim^^oo A(rì I/ ,m i/ ) = 0. Obviously, Q e (T) > Q(T) for any 
quantum channel T. We even have lim £ ^o Qe(T) — Q(T) (see Section fO)) . 

In the purely classical setting even more is known: If e > is small enough, one 
cannot achieve bigger rates by allowing small errors, i. e., C e (T) = C(T). This is the 
so-called strong converse to Shannon's coding theorem. It is stili unknown whether 
an analogous property holds for quantum channels. 

2.8. Ottava Variazione: Isometric Encodings and Homomorphic Decodings 

Definition 12 . Il of channel capacity involves an optimization over the set of ali encoding 
and decoding maps. This set is very large, and it may thus seem favorable to restrict 
both encoding and decoding to smaller classes. 
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In |17) it has been shown that we may restrain our attention to ìsometric 
encodings, i. e., encodings of the forni 



with isometric V, and stili be left with the same capacity (see Section for details). 
Physically, this means that encoding can always be thought of as a unitary process 
augmented by an initial projection onto a subspace small cnough to fit into the channel. 

In the Knill/ Lafìamme setting of perfect error correction |31|. not only are encoding 
maps isometric, but in addition the decodings can be chosen to be of the (Heisenberg 
picture) form 



with isometric V and an arbitrary reference state Qo. We cali maps of this type 
homomorphic, because the first term is an algebraic homomorphism, and the second 
term only scrvcs to render the wholc channel imitai. 

Since the sufficiency of isometric encoding transfers from the perfect error 
correction setting to asymptotically perfect error correction, it may seem reasonable 
to conjecture that a similar result holds for homomorphic decodings. However, up to 
now no such result is known. 

2.9. Nona Variazione: C'oding with a Little Help from a Classical Friend 

Here we consider a setup in which a quantum channel T is assisted by additional 
classical forward communication between the sender (Alice) and the receiver (Bob). 
Clearly, this allows Alice and Bob to collaborate in a more coordinated fashion: Alice 
may use the additional resource to transfer information about the encoding process, 
which Bob on his part may try to take advantage of in his choice of the decoding 
channel. 

However, it is a straightforward consequence of the isometric encoding theorem 
that these new possibilities do not help to increase the channel capacity, even if 
the classical side channel is noiseless: We have Q(T ® id c ) = Q{T) 25 17 , where 
by id c we denote an ideal channel of arbitrarily large dimensionality. That this is 
not a trivial statement is secn from the observation that classical feedback between 
successive channel uses may increase the capacity [2K|f2T) |fT5ll32| . 

The uselessness of classical forward communication may be extended to cover so- 
called separable side channels, i. e., quantum channels with intermediate measurement 
and re-preparation processes. The details on both classes of side channels are spelled 
out in Section|SJ 

2.10. Coda: The Coding Theorem 

Computing channel capacities on the basis of the definitions given, even the simplificd 
ones, is a tricky business. It involves optimization in systems of asymptotically many 
tensor factors. It has therefore been a long-time challenge to find a quantum analogue 
of Shannon's noisy coding theorem UJ, which would allow to compute the channel 
capacity as an optimization over a low dimensionai space. 

According to Shannon's famous theorem, the classical capacity is obtained by 
finding the supremum of the so-called mutuai information, which itself is given in 
terms of the Shannon entropy. A quantum analogue of mutuai information, coherent 



E(g) = VgV 



(11) 



D*(X) = V(X®id)V* + tr(e X)(id -VV*) 



(12) 
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information, has been identified early. For the quantum channel T: B*(Hi) — > B*(Tli) 
and the density operator g £ B*(Tli), it is defined as 

I c (g, T) := S(T(q)) - S (T ® id (\1>M)) , (13) 



where ip G 7ii ® 7ii is a purification of the density operator g, and S 1 , as before, is the 
von Neumann entropy. 

The regularized coherent information has long been known to be an upper bound 
on the quantum channel capacity |16U18llT7) . Le., 

Q(T)< lim - max I c {g,T® n ). (14) 

n— >oo Ti g 

Unlike the classical or quantum mutuai information, coherent information is not 
additive; hence taking the limit n — > 00 in Equation (|14f) is indeed required |21) . 

The first sketch of an argument how to close the gap in Equation (I14|) was given 
by Seth Lloyd jjS]. At a recent conference, Peter Shor presented |1I2| a coding scherno 
based on random coding to attain coherent information. His results have not been 
publishcd yet. Shortly thereafter, Igor Devetak released P| another coding scherno 
based on a key generation protocol made "coherent" |34II35| . By the same techniques 
Devetak and Winter |3HJ|27| very recently were able to prove the long-conjectured 
hashing inequality |25| . which states that the regularized coherent information is an 
achievable entanglement distillation rate, and implies the channel capacity result by 
teleportation |22| . 

These achievements certainly mark a major step in the direction of a coding 
theorem, but do not satisfy ali the properties desired of such a theorem. In particular, 
they stili demand the solution of asymptotically large variational problems. 

3. Elementary Properties of Channel Capacity 

3.1. Basic Inequalitìes 

Before we enter the proof sections we need to review some basic properties of channel 
capacity, which will turn out to be helpful as we proceed, but are also interesting in 
their own right. Ali proofs are easy, and may be found in albeit for noiseless 
reference channels only. The generalization is straightforward. 

Running two channels, Ti and Ti, in succession, the capacity of the composite 
channel, Ti o T 2 , cannot be bigger than the capacity of the channel with the smallest 
bandwidth. This is known as the bottleneck inequality: 

Q(TioT 2 ,S) < mm{Q{Ti,S),Q(T 2 ,S)}. (15) 
Instead of running Ti and Ti in succession, we may also run them in parallel, which is 
represented mathematically by the tensor product Ti <£>Ti. In this case the capacity 
can be shown to be super-additive, 

Q(Ti ® T a , S) > Q(Ti,S) + Q(T 2 ,S). (16) 
For the standard ideal channels we even have additivity. The same holds true if both 
S and one of the channels Ti, Ti are noiseless, the third channel being arbitrary. How- 
ever, to decide whether additivity holds generally is one of the big open problems in 
the field. 



Finally, the two step coding inequality tells us that by using an intermediate chan- 
nel in the coding process we cannot increase the transmission rate: 

Q(Ti,T 3 ) > Q{T u Ti) Q(T 2 ,T 3 ). (17) 
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3. 2. Quantum Capacity of Noìseless Channels 

There are special cases in which the quantum channel capacity can be evaluated 
relatively easily, the most relevant one being the noiseless channel id„, where by the 
subscript n we denote the dimension of the under lying Hilbert space. In this case we 
have 

ld ti 

Q(id„,id m ) = — . (18) 
lam 

A proof follows below. Combining this with the two-step coding inequality (|17|) , we 
see that for any quantum channel T 

IH 777 

Q(T,id n ) = — — Q(T, id m ), (19) 
ld n 

which shows that quantum channel capacities relative to noiseless quantum channels 
of different dimensionality only differ by a Constant factor. Fixing the dimcnsionality 
of the reference channel then only corresponds to a choice of units. Conventionally 
the ideal qubit channel is chosen as a standard of reference, fixing the unit bit. 

Proof of Equation (|18[) This is an immediate consequence of estimates of the 
simulation error A(id„,id m ) = info, .e \\D ìd n E — id m \\ c b between ideal channels. 
We have 

A(id n ,id m ) = 0, if m < 7ì; (20) 

Ti 

A(id„,id m )>l , if 777 > 77. (21) 

777 

The first relation is shown by explicitly constructing E: B*(C n ) — > i3*(C n ) and 
D: i3*(C n ) — > 2?*(C m ) such that D\d n E = id m . To this end we may consider 
C m C C" as a subspace with projection P m . Then E is defined by extending each 
77 x 77 matrix by zeros for the additional (to — 77) dimensions, and 

D: g - P m gP m + tr(1 "~ Pm)g P m . (22) 

TO 

where the second term serves to make D trace-preserving. Then, clearly, DE = id m 
as claimed. 

To prove the inequality (|21|) . choose a maximal family of one-dimensional 
orthogonal projections {P v }u=i,..., m C /3*(C' m ) such that Y^u=i = lm- Then 
for any decoding D: B»(C n ) -> k'(C m ), the relation 

trgi^ := tvD{o)P v V g e B*(C n ) (23) 

defines a set {F v } v —x t „. tm C K*(C ra ) of positive operators satisfying YlvLi ^ = 
For any encoding E: /S*(C m ) — > B*(C") we thus have 

m 

77 = tr l n = tr F v 

v=\ 

m 

> tvE{P v )F v 

v=\ 
m 

= tr D (E (P u )) P v (24) 



Tema con variazioni: quantum channel capacity 



10 



> 



]T trP„ - £|tr {P V D{E{P V )) - P v ) \ 



V=\ I/=l 



77i 



>m- 5] - Pv| 



OO 



> m (1 - llUS-idmlU), 



where in the third line wc have used Equation J22J- Equation l|24|l then immediately 
implies Equation (|21fl . 

We now have to convert these estimates Equations <|2UI21[) into statements for 
achievable rates for S = id„ and T = id m . Thus Equations l|20[) and (|21() apply 
with n replaced by n n " and m replaced by m m>i . So let (n„)„ e pj and (m„)„gu be two 
integer sequences such that finisco n v — oo and lim^oo ^ < tj-^. Then for ali 
sufficicntly large v we have n n " > m m " , and therefore A(?i„,TOj,) = 0, which implies 
that any R < rj-^ is achievable. 

J ldm 

On the other hand, let R = + e, for some e > 0, and choose diverging se- 
quences such that lim^oo ^ = R. Then < m -6 ™" infinitely often, and thus, 
by Equation J2J, A(n„, m„) is close to 1 infinitely often. Hence the errors do not go 
to zero, and the rate R is not achievable. To summarize, Q(id„,id m ) = is the 
supremum of ali achievable rates. I 

By the same techniques, one may also show that the capacity of the ideal channel 
does not increase if the information to be transmitted is restricted to be classical. 

3.3. Partial Transposition Bound 

The upper bound on the capacity of ideal channels can also be obtained from a general 
upper bound on quantum capacities, which has the virtue of being easily calculated 
in many situations. It involves, on each system considered, the transposition map, 
which we denote by 0, defined as matrix transposition with respect to some fixed 
orthonormal basis. None of the quantities we consider will depend on this basis. As is 
well known, transposition is positive but not completely positive. Similarly, we have 
Il ©||oo = 1, but generally ||0|| c & > 1- More precisely, ||6|| c & = d, when the system is 
described on a <i-dimensional Hilbert space We claim that, for any channel T and 
small e > 0, 



where Q £ (> Q(T) ) is the finite error capacity introduced in Section f2.7l In particular, 
for the ideal channel this implies Q(idd) < ld (d). 

The proof of Equation is quite simple |T2]: Suppose R is an achievable 
rate, and that ^ — > R < Q e (T), and encoding E v and decoding D v are such that 

A(n„,m„) = \\D v T® n -E v - idf m "|| cb -> 0. Then we have 



2 m - = ||idf m '-e|| cb < ||(idf m '^^T^^)e|| cb + \\D v T® n »E v &\\ ch 

< ||e 2m „|| cb ||idf ro » -D„T* n "E v \\ cb + \\D„(TQ)»"»QE v Q\\ cb (26) 
<2 m "A(n I/ ,m I/ ) + ||Te||^, 



where in the last step we have used that D v and QE V Q are channels with cb-norm 
= 1, and that the cb-norm is exactly tensor multiplicative, so ||X® n || c b = ||X||" b . 



Q e (r)<id||re|| cb =:g e (T), 



(25) 
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Hence, by taking the binary logarithm and dividing by n v , we get 
rn, + ld(l-A( W )) 
n v n v 

Then in the limit v — > oo we find R < Qq(T) for any achievable rate R. I 

The upper bound Q@(T) computed in this way has some remarkable properties, which 
make it a capacity-like quantity in its own right. For example, it is exactly additive: 

Qe(S®T) = Q@(S) + Qe(T), (28) 

for any pair S, T of channels, and satisfies the bottleneck inequality Qq(ST) < 
min{Qe('S'), Q@(T)}. Moreover, it coincides with the quantum capacity on ideal 
channels: Qe(id n ) = Q(id„) = ldn, and it vanishes whenever T9 is completely 
positive. In particular, if id ® T maps any entangled state to a state with positive 
partial transpose, we have Qs(T) = 0. 



4. Alternative Error Criteria 



In this section we will show that the various distance measures introduced in Sections 
12 . 3l through l2 . fil are equivalent, as long as the reference channel is chosen to be noiseless, 
i.e., S = idd for some d < oo. Hence the remarkable insensitivity of the quantum 
capacity to the choice of the error criterion holds only for the capacity Q{T) which 
is our main concern, but not for the more general Q(T,S) comparing two arbitrary 
channels. The reason for this difference is the analogous observation for distance 
measures on the state space: different ways of quantifying the distance between states 
become inequivalent in the limit of large dimensions, but ali measures for the distance 
between a pure state and a general state essentially agree. 



4-1- Preliminaries 

The following lemma will serve as a starting point for showing the equivalcncc of 
fidelity, ordinary operator norm, and cb-norm criteria. By ||^4||i := tr(yC4*A) we will 
denote the trace norm of the operator A e B(C d ), by ||A||2 := y/tr(A*A) its Hilbert- 
Schmìdt norm, and by ||^4||oo the ordinary operator norm. These norms are related 
by the following chain of inequalities: 

WMoc < \\A\\ 2 < \\A\\i (29) 

(see Chapter VI of 1381 for a thorough discussion of these Schatten classes, these and 
other useful properties, and the relation to £ p -spaces). Of course, ali norms in a 
finite-dimensional space are equivalent, so there must also be a bound in the reverse 
direction. This is 

libili <d || AH». (30) 

The cruciai difference between these estimates is that the bound in Equation (|3L)fl 
explicitly depends on the Hilbert space dimension d, which makes this inequality 
useless for applications in capacity theory, where dimensions grow exponentially. Our 
aim in this section is therefore to relate the various error measures with dimension- 
independent bounds (see Proposition 14.31 belowì . 
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Lemma 4.1 Let g be a density operator and be a unit vector in a Hilbert space Ti. 
Then 



Ile- Idilli < 2^/i- Mé#>, (3i) 

with the inequality being strict iff g is pure or ìp is orthogonal to the support of g. 

Proof: Suppose first that g = \<p)(<p\ is pure. Then we can compute the trace norm in 
the two-dimensional space spanned by ìp and ip. For the moment we will only use this 
property, i. e., we assume that ìp = (1, 0) is the first basis vector, and g is an arbitrary 
(2 x 2) density matrix. Then we may expand the traceless operator g— \ìp){ìp\ m terms 
of the Pauli matrices {o'ì}ì = i ! 2,3, as follows: 

g - \ip){ìp\ = - !) °3 + Re(£>i 2 ) cri - Im(gi 2 ) ct 2 . (32) 



From this we find the eigenvalues of g — \ìp){ip\ to equal ±y/(gn ~~ l) 2 + I £?12 1 2 , and 
hence 



He- Idilli = 2V(en-i) 2 + |ei 2 | 2 . (33) 

Positivity of g clearly requires det g > 0, implying |f?i 2 | 2 < gii g 22 = gii (1 — £>ii)- 
Since tr(g 2 ) = 1 + 2 {gu{gu — 1) + |é>i 2 | 2 ), equality holds if g is pure. Inserting 
| P12 1 2 = gii (1 — £11) into Equation H33|l directly yields that the first inequality in 
Equation Q31JI is indeed strict for pure states. 

We now drop the assumption that g is pure and consider an arbitrary convex 
decomposition g = ^ Aj gì into pure states gi. Then because x 1— > \J\ — x is concave 
we obtain: 

< E ^ Ilei -Idilli 

i 

= 2 E AiVi-MeiM 

* , (34) 



2>/r 



e 



where in the second step the result for pure states has been used. This establishes 
Equation (J3TJ. 

Now suppose that equality holds in Equation l|34|) . Then because the concavity 
of x 1— > VI — ^ is strict, (V'IeilV') = (^lelV 1 ) ^ But since the convex decomposition 
of g is arbitrary, we may conclude that 

= (35) 
for any vector ip in the support of g. By polarization this implies (ipi\ìp) (ìp\(f2) = 
((pi 1^2} (V'IelV')) from which it follows that 

s\ii>MS = W\e\1>)s, (36) 

where S 1 denotes the projection operator on supp(o). Hence either the factor (V'IelV') 
vanishes, entailing Sìp = 0, or else S is a rank one operator, and thus g is pure. This 
concludes the proof. ■ 

From Lemma 14.11 we may derive a fidelity-based expression for the deviation of a 
given channel from the ideal channel: 
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Lemma 4.2 Let Ti be a Hilbert space, dim7i < 00, and T: B*(H) — > B*(Tl) be a 
channel. We then have 

||T-id|| oc <4 sup \y/l-(m\iPM)W\- ( 3? ) 

11-011=1 L J 

Proof: Note that the operator norm \\T — id^ equals the norm of the adjoint 
operator on the dual space, i. e., 

||T-id|| 00 = sup \\T*q-q\\i (38) 
llel|i<i 

(cf. Chapter VI of [38] or Section 2.4 of for details). Any matrix g, \\g\\i < 1, has 
a decomposition g = g\ + i gì into Hermitian Qi satisfying \\qì\\i < 1. Inserting 
this decomposition into Equation 1|38[) and using the triangle inequality, we find 
1 1 — ici Hoc < 2 sup ||T»g— g\\i, where the supremum is now over ali Hermitian matrices 
g obeying ||g||i < 1. 

By spectral decomposition, any Hermitian matrix g can be given the form 
g = X)j r ifti w here the gi are rank one projectors, and the coefhcients are real 
numbers satisfying J^. |rj| = \\g\\\- Inserting this into the supremum, we see that 
\\T — id||oo < 2 sup \\T„q — q\\x, where optimization is now with respect to ali one- 
dimensional projectors |^>)(^|. The inequality then directly follows from Lemma |4. Il 



4-2. Four Equivalent Distance Measures 

We now have in hand ali the tools we need to prove that the distance measures 
presented in Sections 12 . 31 throuah 12 . 61 do indeed coincide: 



Proposition 4.3 Let H be a Hilbert space, dimTL < 00, and let T: B*(TL) — > B*(H) 
be a channel. Then 

1- inf F e (g,T)<4^/l-F(T) 
e eB,{H) 



<4 v /||T-id|| 00 

< 4Vl|T-id|| c6 (39) 



<8 1- inf FJq, T) 

These are the dimension independent bounds we need: if a sequence of channcls 
becomes close to ideal in the sense of any of the error measures appearing in this 
proposition, so it will be in terms of ali the others. The equivalence of the basic 
capacity definition 12.11 based on the cb-norm and the definitions based on minimum 
fidelity and entanglement fidelity, as presented in Sections 12.31 and 12.51 then directly 
follows. 

It is cruciai for Proposition 14.31 that we are considering only the deviation of T 
from the ideal channel, so we can use Lemma l4.1l for the distance between an output 
state and a pure state. Therefore, for the general capacity Q(T, S) the choice of the 
error quantity may remain important. General properties such as superadditivity flfijl. 
which are easy to see for the cb-norm criterion, might therefore fail for the simplcr- 
looking operator norm \\T— <S , || 00 - This is the principal reason for choosing the cb-norm 
in the basic definition. 
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Mean fidelity, as used in Section 12.41 and channel fidelity, as introduced in 
Section l2~51 are conspicuously absent from Proposition l4.3l Their role will be discussed 
in Section POI 

The equivalence of Schumacher's originai definition of channel capacity in terms 
of the entropy rate will then be treated in Section l4~4l 

Proof of Proposition 14.31 Let (p e Ti <g> Ti be a purification of g e B*(Ti). We 
then have 

1 -F e ( e ,T) = (4>\ (id -T)®id (\4>M)\4>)- (40) 
By Schmidt decomposition, (j> can be given a representation \<jj) = J^j li) ® \f), 
where and are orthonormal systems in Ti, and the so-called Schmidt 

coefficients {Xj}j are non- negative real numbers satisfying J2j A| = 1. Inserting this 
representation into Equation l|4()[) . we see that 

1 - F e (g,T) = A? A 2 fe 01(id - T)(\j){k\)\k) 

^^AfAgiiid-riuiiiiXfcHU (4i) 

= Mici -T||oo, 

where in the last step the normalization Y]j Xj = 1 has been applied. 

The first incquality then immediately follows from Lemma 14.21 and the definition 
of minimum fidelity, Equation l|4"|). 

An application of the Schwarz inequality directly gives the second inequality: 

1 - M TMM) \4>) = (V'I (id - T)Q1>M) IV-) 

< Ilici -riu \\\ii>M |U (42) 
= lir-idiloo 

for ali unit vectors ip £ Ti. 

The third inequality is obvious from the definition of cb-norm, so we only need 
to prove the last step. Applying Lemma l4~2l to the operator T<g> id„ and then taking 
the supremum over n on both sides, we see that 



|T-id|U<4./l- inf F(T®id„) 



= 4 4 /l- inf F e (g, T), 
1 geB,(n) 



(43) 



concluding the proof. ■ 

4-3. Average Fidelity and Channel Fidelity 

Average fidelity and channel fidelity have been shown |4l)U41j to be directly related 
error criteria: 

Proposition 4.4 Let F(T) be the average fidelity and F C (T) be the channel fidelity 
of a quantum channel T: B*(Ti) — > B*(Ti), as introduced in Equations 0) and 
respectively. We then have: 

T(T) _ é_Mn±± t (44) 

where d is the dimension of the underlying Hilbert space Ti. 
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From Proposition 14.41 we may conclude that both quantities coincide in the large 
dimension limit d — > oo. Consequently, average fidelity and channel fidelity are 
equivalent error criteria for capacity purposes. 

However, neither appears in Proposition 14.31 After giving a somewhat simplifìed 
proof of Equation Q44[l. we show by an explicit counterexample that this omission is 
not accidental. Since a coding for which the worst case fidelity goes to 1 also makes the 
average fidelity go to 1, the capacity defined with average fidelity might in principle 
be larger than the standard one. That these capacities nevertheless coincide will then 
follow from Proposition 14 . 51 in Section l4~4l A more direct proof for the equivalence of 
average fidelity, i. e., a proof not making use of Proposition 14.41 is then presented in 
Section IO 

Proof of Proposition 14.41 Suppose that {U}i is a set of Kraus operatore for the 
quantum channel T, i. e., T(a) — J2% ^i a ^\ V <r € B*(H). 

In the course of the proof we will repeatedly employ the so-called flip operator 
F e B(H) <8> B(H), defined by ¥(cp (gì r/j) := ijj ® tp. In a basis {n}„=i,...,<j of H, 
d := dim7i, this corresponds to the representation 
d 

F = \n,m)(m,n\. (45) 

Working in this representation one easily verifies that for ali operators A,B E B{Ti) 

trF (A ® B) = ir AB. (46) 
In terms of the Kraus operators {ti]i the average fidelity of T then reads 



F(T) = J (Uip\T (\Uip)(Uip\) \Uip) dU 

= tr j t*Ug U%Ug U* dU (47) 

t 

= ^trF(tJOt i ) J{U®U) {g®g){U®U)* dU, 
ì 

where g := IV'XV'I is a arbitrary pure reference state, integration is over ali unitaries 
U G B(TC), and in the last step we have applied Equation (|4^|l . The second factor 
under the trace, 

P( Q ) : = f{U® U) (g ®g)(U® U)* dU, (48) 



is obviously invariant under locai unitary transformations, i. e., [P{g) \ V® V] =0 for 
ali unitary operators V E B(7i). Such a state is usually called a Werner state 021) an d 
it follows from the thcory of group representations that these states are spanned by the 
identity operator and the flip operator, P(g) = «1 + /3F with complex coefficients a, (3 
(see and Chapter 3.1.2 of ^11 f° r details). The coefficients can be easily obtained 
by tracing P(g) with the identity and flip operator, respectively, and are both found 
to equal grgqrn • Inserting the expansion 
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into Equation (|47|) and making again use of Equation Q46[l. we see that 



d(d + 


1) 


1 




d(d + 


1) 


1 






1) 


1 




d(d + 


1) 


1 


(1 


d+ 1 



(50) 



(l + dF c (T)), 

where in the second step we have used the normalization t*t { = 1, and in the third 
step Equation Q has been applied for the state g = ■ 



We proceed with the advertised 
Counterexample: For g € B*(C d ), we set 

T(g) = P + gP + + P_gP_, (51) 

where P + := is some one-dimensional projector and P_ := id — P + its ortho- 

complement. Then by Equation (JZ|) we find 

w) = ^EW = ^|^, ( 52 ) 

and therefore lim^oo F(T) — liru ( i_ > . 0O F C (T) — 1, the first equality by Equation 1441) . 
However, using Equation Ij38(l we have 

||T-id|| 00 = sup ||T*£-£||i 

llelli<i 

> V geB*(£ d ), (53) 

and by choosing g = +ip2)(' l Pi +ip2 \ such that ip2 -L ipi, |]r — id || oo can be easily 
shown to be nonzero, and independent of d. Hence there exists no bound of the forni 
||T — id||co < f[F c {Tyj with a dimension independent function /, such that x — ► 1 
implies f(x) — > 0. 



Entanglement Fidelity and Entropy Rate 

Let us briefly summarize what we have learned so far about the interrelation of the 
various distance measures introduced in Section [21 From Proposition 14.31 and the 
results of the previous section we may infer the existence of two classes of equivalent 
error criteria, one of them containing average and channel fidelity, the other one cb- 
norm distance, operator norm, minimum fidelity, and entanglement fidelity. 

In order to show that both classes lead to the same quantum channel capacity, 
we will have to construct, from a given coding scheme with rate R and channel fidelity 
approaching one, a sequence of Hilbert spaces (JC n )neN ali pure states of which may 
be sent reliably with rate R. This is the essence of the following Proposition, which 
closely follows the argument presented in Section V of [TJ\. Although for this purpose 
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we only necd to consider channel fìdelities, and thus the chaotic density operator, 
the statement is kept more general to apply to ali density matrices, since this will 
immediately allow us to cope with Schumacher's definition of channel capacity in 
terms of entropy rates as well. 

Proposition 4.5 Let H be a Hilbert space with d := dimH < co. Let T: B*(H) — » 
B*(H) be a channel, and g £ B^iTL) a density operator. Then, for a suitable k- 
dimensional projection Pk € B(H), and for the "compressed channel" B^(PkH) — » 
B^(PkH) given by 

T fc (tr) := P k T(a) P k + tr ((1 - P k )T(a))) -j- P k , (54) 



the estimate 



holds with both 



F(P k H,T k ) > 1 - 1 ( 55 ) 



q* = k\\p\\oo and (56) 

_ l + ldd-g(p) 

9 ~ ldd-ldA ' { ' 

where S again denotes the von Neumann entropy. 

Proof: The idea of the proof is to recursively remove dimensions of low fidelity from 
the support of g until we are left with a Hilbert space of given dimcnsion k and a 
minimum pure state fidelity bounded from below in terms of F e (g,T). To this end, 
define 

/:W->R, ^^(V'IWXV'DIV')- (58) 

Setting d := dim supp(g) and go '■= Q, we now recursively define a collection {gi}i=o d 

of positive operators, as follows: 

Qi ■= Qi-i ~ Qi \fi){fi\, (59) 
where ipi is the state vector in the support of Qi-i that minimizes /, and is the 
largest positive number that leaves Qi positive. Note that sincc dim7i is finite, qi 
can be chosen to be strictly positive. By construction, supp(g» i ) C supp(g i _ 1 ), and 
rank(ft) = rank(^_i) — 1; so our procedure removes dimensions from the support of 
g one by one. It follows that 

d 

Q = ^2<1ì\ ( Pi)( ( Pi\, (60) 

i=l 

implying £\ =1 % = ti(g) = 1. 

Using the convexity of entanglement fidelity in the density operator input, we see 

that 

F e (g,T)=F e (j2qi\<Pi){<Pi\,Tj 
d 

<$>i/fai) (61) 
1=1 

d-k d 

< f(<Pd-k)^2qi + qi > 

ì=l i=d-k+l 
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where in the third line we have used that k i— > f(ipk) is non-decreasing by construction. 
We now take the subspace PkH as the span of ali vectors {<fi}i=d-k+i d- Then, since 
(iplT^M) W > (Ì>\T{\Ì>M) \1>) for V> e P k H, we have F{P k U,T k ) > f(ip d - k ). 
Introducing q* = Yli=d-k + 1 ^ ano - usm § J2i=i 9i = 1 — <Z*i we immediately have the 
desired estimate. 

Our remaining task is to give upper bounds on q* , either in terms of the largest 
eigenvalue of p, or its entropy. Note that from the sum representing g in Equation (|6L)[1 
we have 

d 

qi < qi + ^Qj \{pj\<Pi)\ 2 = {<pì\q\<pì) < IlelU- (62) 

3=1 

Therefore, each of the k terms in q* is bounded by ||f?||oo, and we get q* < k\\g\\ oo, 
which gives the first estimate. 

For the entropie estimate, note first that in the inequality 

S(^2<im) < ^2qtS((Ji) - ^q t ldq l , (63) 

i i i 

which is valid for arbitrary convex combinations of states Ci with weights (cf. 
Chapter 11.3.6 of [33]). the case of pure states leaves just the entropy of the 
probability distribution q. On the other hand, it is obvious that among ali probability 
distributions with given weight q* for the last group of k indices, the one with 
the highest entropy is equidistribution, in each of the ranges 1 < i < d — k and 
d — k + l < i < d. Evaluating the entropy of this distribution, and combining this 
with the previous estimate we find 

S{p) < H 2 {q*) + q* Idk + (1 - q*) ld (d - k) 

< l + ldd-g*(ldd-ldfe), (64) 

where the first term denotes the binary Shannon entropy, 

H 2 (q*) = -q*ldq* -(l-q*)ld(l-q*)< 1, (65) 

and we have also used ld (d — k) < ld d. Hence the result follows by writing this as an 
upper bound for q* . ■ 

Proposition 14.51 allows us to make the transition from average error criteria and 
entropy rates to maximal error criteria. So let us assume that a coding scheme 
(E n , D n ) ne m is given, together with a sequence (pn)nGN of source states, such that 
F e (p n , D n T® n E n ) — > 1. Then the channel T k will again be a corrected version of 
T®", but we can now conclude that its worst case fidelity goes to one. 

Let us first consider the case in which the source does not appear explicitly, Le., 
in which we assume either the mean fidelity or, equivalently, the channel fidelity to go 
to one for a scheme with rate R. Since the channel fidelity is just the entanglement 
fidelity with respect to a maximally mixed p, we may apply Proposition 14.51 with 
p n = 1/ dimTCn and dim7i„ = 2^- nR ^ , where conventionally by [^J (read: floor of x) 
we denote the largest integer no larger than x. We set k — dim7i„/2, which is to 
say that the modified coding scheme correets just one qubit less than the originai one. 
Then fc||/3||oo — 1/2, and we immediately find that the minimum fidelity is at least 
1 — (1 — F e )/2, and hence also goes to 1. 

The second case of interest is that of a source satisfying the quantum asymptotic 
equipartition property (QAEP). That is to say, for large n the Hilbert space TL n can 
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be decomposed into a subspace on which p n essentially looks like a multiple of the 
identity, and a subspace of low probability: Given any e > 0, for large enough n 
essentially ali the eigenvalues A of a QAEP quantum source (g n )nefi with entropy rate 
R are concentrated in a so-called s-typìcal subspace, i. e., 

2 -n(R+e) < A < 2- n(R -*\ (66) 

in the sense that the sum of the eigenvalues that do not satisfy Equation H66|) can 
be made arbitrarily small. We can then conclude that \\p n \\oo < 2- n{R -^ for large 
n. Hence, if we choose k ss 2 n ^ R ~ 2e \ we can guarantee that q* — » 0, and once 
again the worst case fidelity has to go to 1. This case covers product sources and 
stationary ergodic sources 28 29 30 , and many others of interest. The discussion of 
the equivalence between the minimum fidelity version and Schumacher's entanglement 
fidelity version of channel capacity in ^7] is limited to this case. 

Does the equivalence hold even without the equipartition property? We will give 
a counterexample below, which is, however, rather artificial from the point of view of 
typical coding situations: the dimension of the spaces H n grows superexponentially. 
This is indeed necessary. For if we have an upper bound dim TL n < r™ for some posi- 
tive Constant r, S(p n ) ss nR, and k ss 2 n ( R ~ e \ we find that q* in Equation (|57|) goes 
to the Constant (ldr — R)/(là.T — R + e) < 1. Therefore, the maximal subspace fi- 
delity in Equation i|55|) in Proposition l4.5l goes to one if the entanglement fidelity does. 



Counterexample: Here we show the claim that the Schumacher capacity with un- 
constrained sources is infinite for ali channels with positive quantum capacity. In fact, 
suppose that we are given a coding scheme with channel fidelity going to 1. Then we 
simply enlarge the Hilbert space Ti. n by a direct summand K n of some large dimension, 
and let 

p n = (l-g w ) " +£n-r. — p- (67) 
dim Hn dim/C„ 

The coding operations on JC n can be completely depolarizing, for as long as e n — * 0, 
the entanglement fidelity of this source goes to 1, as required. On the other hand, the 
entropy of this source is 

S{p n ) > e„(-lde„ + ld dim/C„) (68) 

Clearly, we can make S(p n )/n diverge if only we let dim/C„ go to oo fast enough. 



4-5. Entanglement Generation Capacity 

We now focus on Devetak's |3| entanglement generation capacity, as introduced 
in Section 12.51 and verify that it is totally equivalent to the definitions discussed 
above. The proof is based on entanglement-assisted teleportation ^Sj and therefore 
involves classical forward communication from the encoding to the decoding apparatus. 
However, this additional resource is shown in Section Ifì.ll not to affect the quantum 
channel capacity. 

Due to the additional freedom of choosing an arbitrary pure input state T £ TC^ìTC 
instead of the maximally entangled state fì, the entanglement generation capacity is 
certainly no smaller than the capacity based on channel fidelity, which was shown to be 
a valid figure of merit in the previous section. So we only need to prove the converse. 
This is easily done with the help of teleportation: In the entanglement generation 
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scenario, the sender and receiver end up sharing a state a :— (DTE ® id-^) (|r)(r|) 
which has asymptotically perfect overlap with the maximally entangled state, 

F := (fi|<j|fì) > 1-e (69) 

for some (small) e > 0. The output system can thus be readily interpreted as being 
in the maximally entangled state with probabili ty F w 1, and hence can be used as 
a resource in the standard teleportation protocol 05] to transfer arbitrary quantum 
states from the sender to the receiver with fidelity no smaller than F, at the same rate 
R. 



5. Isometric Encoding Suffices 

In this section we will show that if there exists a coding scheme that achieves high 
fidelity transmission for a given source, there is another coding scheme with isometric 
encoding, as in Equation (|llfl . that also achieves high fidelity transmission. It then 
directly follows that in the definition of channel capacity we may restrict our attention 
to isometric encodings, as claimed in Section I2~%1 While this result is originally due 
to Barnum et al. ^7], here we give a slightly generalized version of A. S. Holevo's 
presentation (cf. Chapter 9 of UHI)- Ali we need is the following 

Proposition 5.1 Let Ti, K, be Hilbert spaces with dimensions r\ := dim7Y and 
k := dim/C. Let g € B*(H) be a density operator, and E: B*(H) — ► S*(/C) a completely 
positive map such that tv(Eg) = 1. Let T: B^(1C) — > B*(Ti) be a channel. 
We may then find a channel E: B*(Ti) — » Ì3*(/C) such that 

F e (g,T È) > (F e (g,T E)) 2 , (70) 

where for a 6 £>* (Ti) we have 

-, , f Val" : 77 < k 

with isometries V: Ti — * K, and W: K. — > Ti, respectively. 

Proof: Let {ti}i=i,..., T and {ej}j—i t ... te be sets of Kraus operators for the maps T 
and E, respectively. By Equation {7J we have 



F e (g,TE) = J2 ìttici Q? = E l^l 2 ' ( 72 ) 

where Xìj :— tv ti ej g. If t ^ e, add zero components so that A" becomes an (m x to) 
square matrix, m :— max{r, e}. 

By the singular value decomposition we may find unitary matrices A, B such 
that X = ADB, where D is diagonal with real non- negative entries. Since this 
decomposition simply corresponds to a change of the Kraus representation of T and 
E, we may assume without loss that X is diagonal already, and thus 

ni ni 

F e {g,TE) = Y J XÌi=Y.^ Ue *e) 2 - (73) 

i=l i=i 

Now for k — l,...,m, we define Afe := tregge*).- Let g = J^j Pj IV'jXV'jli Pi > 0> 
^2jPj = 1, be a diagonal representation of the density operator g. Then Afe = 
J2jPj\\ e k^jWL and tri fe e fc g = J2jPj (V'j'l*fc e fclV'j}- Thus, A fc = implies that e k ìpj = 
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V j, and therefore trt^ £k Q = 0, so that these terms do not contribute to the sum 
in Equation \TÒ\ . We may therefore assume without loss that > V k = 1, ...,m. 
Moreover, 

m rn 

^2x k =J2ixe k Qe* k = ixE{Q) = l, (74) 

k=l k=l 

where in the last step we have used that E is trace-preserving on the state g. Since 
F e(ft T^ = £>(*^, (75) 

2—1 

we may find an index k such that 

(trtegf = (tH \ 6fcg)2 > F e (g,TE), (76) 
Afe 

where we have introduced the short-hands e := -f== and t := ffc, respectively. 

V Afc - 

Applying the Schwarz inequality we see that 
(tri e g) 2 = |tr (** ^~g)* e ^~g\ 2 

<tr(tt* g) tr{e*eg) (77) 
= tr(\t*\ 2 g). 

Let us treat the case rj < k first: Since t* t < Ijc, by working in the spectral 
representation one easily obtains tt* < 1-n, and \t*\ 2 < \t*\, from which it follows 
that 

tr\t*\ 2 g < ti \t*\g = tv tV g, (78) 

where by V: H — > K. we denote the polar isometry of £*, i. e., t* — V\t*\. Existence of 
this isometry requires 77 < k. Since 

m 

|tr £>| 2 < X] |tr ti V g\ 2 = F e (g, T V{-)V*\ (79) 

i=l 

tracing backwards our results leaves us with the following chain of inequalities: 
F e (g,TV(-)V*) > \trtVg\ 2 by (O 

>(tr|t*i 2 g) 2 b Y m 

>(trt e(? ) 4 by G3 

> (F e ( ft T£)) 2 , by® (80) 

which is what we set out to prove. 

If r) > k, the proof proceeds very similarly: We denote the polar isometry of t by 
W: K, — ► Ti, i. e., t = W|t|. From Equation (|77|l we may then conclude that 

(trteg) 2 < trtt*g 

= trW\t\ 2 W*g 

• irli / \\ g 

= tr tW*g, 

where in the second to last step we have used that t* t < 1^, and thus |t| 2 < \t\. 
Substituting W* for V, we now mimic Equations l|79|l and l|8U|l to conclude that 
(F e (g,T E)) 2 < F e (g,TW*(-)W). The map W*(-)W, while being completely positive, 
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is not necessarily trace-preserving. The desired result then follows by renormalization, 
as above in Equations 122H and l|54|l . ■ 

In Proposition 15.11 we have included cases in which the input space of the channel 
is strictly smaller than the input space of the encoding map, k = dim/C < dim7Y = rj. 
Though we do not need to consider this situation in our settings, it may prove helpful 
in other applications to avoid cumbersome distinction of cases, and thus has been 
added for convenience. 

To arrive at the statement that isomctric encoding sufhccs, ali we need to do then 
is to combine Proposition l5.1l with the channel fidelity definition of quantum capacity 
Section l2~5l taking g to be the chaotic density matrix, E to be the encoding channel, 
and thinking of T as the concatenation of quantum channel and dccoding channel. 

6. Classical Side Information 

As claimed in Section |2~§1 it is a straightforward consequence of Proposition 15 . Il that 
classical forward communication has no effect on the quantum channel capacity |25ll7j . 
However, before entering the proof we need to make a few more comments on the 
assisted channel T^id^, where T: B*(7ii) — > B*(TÙ2) is an arbitrary quantum channel 
and id^ denotes the identity on a classical system with A 6 N states. Thus, in the 
limit of large dimensions an n-fold tensor product of the channel T will be assisted by 
a classical system with a total of A™ states. 

The results we are going to present in this section apply slightly more generally: 
Instead of a priori fixing a side channel of given (if arbitrarily large) dimension, the 
encoder may choose the size of the side channel in the encoding process, which also 
covers the case of super-exponentially growing side channels. The capacity of a channel 
T assisted by this type of classical forward communication will be marked with a 
subscript, Q c f(T). This generalization will play a role in Section Ifi. 21 

Of course, Q c f(T) > Q(T ® id^) > Q{T). It is the arni of Section O to show 
that ali these capacities are equal. 

6.1. Classical Forward Communication does not Increase the Channel Capacity 

Obviously, for classically assisted channels the encoding is a channel with both a 
classical and a quantum output. Such channels are usually called instruments |47j . 
and can be thought of as a collection of trace-nonincreasing operators {£'a}a=i,...,A 
summing up to a channel E = X)a=i ^A- The index A € {1, A} represents classical 
information that may be obtained in the encoding process and sent undisturbed to the 
decoder over a noiseless classical channel. Depending on the value of A, one channel 
D\ out of a collection of trace-preserving quantum channels {-Da}a=i,...,A is used in 
the decoding process. 

The definition of achicvable rates and channel capacity now completely parallels 
the definition of the unassisted quantities. Here we focus on the channel fidelity version 
of channel capacity (cf. Section |2~5*|) . since this is a definition Proposition 15. Il is well 
suited for. Of course, ali other definitions of channel capacity can be extended to 
classically assisted capacities in the same spirit. 

We may thus say that R is an achievable rate for the classically assisted 
quantum channel T iff there is a sequence of Hilbert spaces (/C n )neN satisfying 
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rim„_ >00 ld dl ™ /c " — R and a sequence of encodings (^A„,n)A„=i,...,A„,n£N and 
decodings (-DA„,n)x„=i,...,A„,neN f° r some integer sequence (A„) n(E N such that 

A„ 

lim V F c (D Xn<n T® n E Xntn ) =1. (82) 

n — >oo * — * 

A„ = l 

The quantum capacity Q c f(T) of the channel T with classical forward communication 
is then defined as the supremum of ali achievable rates. 

Of course, the capacity of the channel T <g) id A for fixed A E N is obtained by 
setting A„ :— A n in the above definition. 

Theorem 6.1 Let T: B*(Tii) — > 8^(0.2) be a quantum channel and A £ N. We 
then have 

Q cf (T) = Q(T®id A ) =Q(T) (83) 

Bcfore giving the proof let us consider a seeming generalization of this theorem, 
which allows the side channel R to be any separable channel, i.e., a channel R = 
R2R1 operating by first collecting classical information, by a channel Ri, say and 
then recoding this into quantum information by another channel i?2- Equivalently, 
(id (g> R) maps any input state to a separable state, so these channels are also called 
"cntanglcment breaking" 48,49 . Then, for every such R and any channel T we have 
the following 

Corollary 6.2 Separable side channels do not increase the quantum channel capacity, 
i. e., for any quantum channel T and any separable channel R = R2 o Ri we have 

Q{T®R) = Q{T). (84) 

Proof of Corollary 16.21 For a separable channel R = Ri o R 2 — R2 o id A o Ri , we 

have 

Q(T) < Q{T ® R) =Q((id ®i? 2 )(T®id A )(id <g> Ri)) 

< Q(T® id c A ) = Q(T) , ^'^ 
where the first inequality follows by using codings ignoring the channel R, the second 
follows by the bottleneck inequality (|15|l , and in the last step we have applied Theorem 
16.11 From this chain of inequalities we get Q{T) = Q(T £g) R), just as claimed. ■ 

We now follow ^7] in the 

Proof of Theorem 16.11 From our remarks it is clear that we only need to prove 
the inequality Q c f(T) < Q(T). Given a sequence of Hilbert spaces (K. n ) n eN and suit- 
able encodings (-E;\„,n)A„=i,...,A„,neN and decodings (D Xn m)\ 7l =i,...,A n ,neN such that 
the classically assisted channel T achieves the rate R with channel fidelity approaching 
one, we will show that the same rate can be achieved without using the side channel. 
From the definition of channel capacity, for any e > we may find n e € N such 

that 

A„ 

J2 F c {D Xn , n T® n E Xntn ) >l-e V n>n e . (86) 

A n = l 

For the remainder of the proof we will fix n > n £ and drop the index to streamline 
the notation. Setting e a := tr E x t/d with d := dim/C n and 

F C {D X TE X ) := — F C (D X TE X ), (87) 
e X 
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we may then rewrite Equation l|86[) as 

A 



(88) 



Since e\ = 1, there is an index /i such that F^D^ T E^) > 1 — e. However, 



and we may therefore apply Proposition l5 . Il to conclude that there is a channel E such 
that 



from which it follows that the same rate can be achieved without relying on the 
classical side channel. ■ 

6.2. Average Fidelity by Forward Communication 

We already know that average fidelity may be considered a suitable error criterion for 
capacity purposes. The line of thought we followed to establish this fact proceeded 
via the equivalence of average fidelity and channel fidelity, Proposition l4.4l and is thus 
ultimately based on Proposition 14.51 

The results of the previous section regarding the uselessness of classical forward 
communication can be employed to give an alternative proof that average fidelity 
serves as a valid distance measure, making use only of the sufficiency of isometric 
encoding. 

To this end, we will show that instead of evaluating the average fidelity F(T) 
for a given channel T, we may just as well compute the minimum fidelity F(T), 
where the new channel T is simply the old channel T augmented by classical forward 
communication. However, by Theorem l6 . 1 I the quantum capacities of T and T coincide, 
and therefore average fidelity and minimum fidelity turn out to be equivalent error 
quantities. 

While our concept of classically assisted channel capacity, as presented in Section 
16.11 only allows for discrete classical messages and therefore involves the calculation 
of finite sums, for the evaluation of average fidelity we will rather deal with integrals, 
corresponding to continuous classical messages. However, this extension poses no dif- 
ficulties. 

So suppose we have at hand a quantum channel T: B*(Hi) — > B*(H.2) together with 
encoding channel E: B*{H.?,) — > and decoding channel D: B*(Ti.2) — » B*{TL?,). 

By Equation (JSJ the average fidelity of the concatenation D o T o E then reads 



where integration is over ali unitaries U £ B Sf {TL' ì ), and g = IV'XV'I G B^Tiz) 
is an arbitrary reference state. We will now convert DTE into a channel with 
classical forward communication. Denoting by C(X) the vector space of continuous 




(89) 



F c {DpTÉ) > (1-e) 2 , 



(90) 




(91) 
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functions on the set X, we may define D: B^(TÌ2) ® C(U(TL^)) — > B*(H?) and 
B: #*(H 3 ) -> B*(Wi)®C(J7(W 3 )), asfollows: 



D{ e ®f):= jU*D{ e )Uf{U)dU, 
Èu(a) := E(UaU*), 

where in the definition of i? we have made use of the fact that B(TL) <g> C(X) is 
isomorphic to the £?(7i)-valued functions on X. We then see that for any state 

W\D(T®id cm H 3 )))E(\i>M)\i>) 

= tiQD(T®idc(u(H 3 )))É(Q) (93) 
= tr (q J U*DTE(UqU*)U dUj . 

The average fidclity for the concatenation DTE thus cquals the minimum pure state 
fidclity for the classically assisted channel D (T ® id ) E, which is what we wanted to 
show. 

Note that for this proof to apply also in the setting of n-fold tensor produets, 
as required by the definition of channel capacity, we may not restrict ourselves to 
exponentially growing side channels, (T (£> id^)®": This would correspond to an 
averaging over n-fold tensor produets of unitary operators, U\®U2® - ■ -®U n . However, 
not ali unitary operators on an n-fold tensor product are of this form. 



7. Testing Only One Sequence 

We will now prove the claim made in Section 12.21 If a coding scheme construction 
works for a certain pair of integer sequences {n v ) u( z^, (nv)„£N such that the rate 
R is achieved infmitely often, i. e., lim^^oo-^ = R, and the error tends to zero, 
lim^a, A(n^, m„) = 0, then coding works for ali such pairs. 

As mentioned in Section 12.21 this requires extending a given coding scheme to 
more block sizes. Therefore this section will be organized by extension method: In 
Section l7.1l we use only the method of wasting resources, i.e., either using the coded 
channels for fewer bits than allowed by the given coding scheme (i.e., decreasing m„) 
or requiring some additional channel uses (thus increasing n v ) and simply not using 
them. This will allow the extension whenever we can find a subsequence along which, 
on the one hand, the desired rate is achieved and which, on the other hand, does not 
grow too fast. 

A second method would be to use blocks from the given coding scheme and put 
them together as tensor produets, to get to larger block sizes. We show in Section I7T21 
by an explicit example that this method, combined with the wasteful one, is not 
sufficient to extend a very sparse coding sequence to ali large block sizes. 

Finally, we show in Section 17151 based on the work in |24) . how hashing codes can 
be used to achieve the desired extension in ali cases. 

Throughout, we will denote by (JV At ) Atg N, (M m ) aì6 n the given coding sequences, 
and assume, without loss of generality, that the sporadic rate is attained, i.e., 
R = linip^oo-^ = lim^-Kx, jf-. The sequences to which we seek to extend the 
scheme are denoted by {n v ) v ^ and (m !/ ) t/£ N, as before. 
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7.1. Subexponentìal Sequences 

Obviously, good coding becomes easier the more parallel channels are available for 
transmission. Moreover, if a certain coding scheme works for some Hilbert space Ti, 
it works at least as well for states supported on a lower dimensionai Hilbert space Ti' . 
Thus, the error quantity A(n,m), as introduced in Dcfinition 12.11 has the following 
monotonicity properties: 

A(n + 1, m) < A(n, m) < A(n, m + 1) (94) 

for ali positive integers n, m. We cali a diverging sequence (N^)^^ subexponential if 

lim ^±1 = i . (95) 

This covers, for example, ali arithmetic sequences, and polynomially growing ones. For 
such sequences the desired result follows directly from the following Lemma, which 
slightly generalizes Lemma 3.2 of |24|. 

Lemma 7.1 Suppose A: NxN-> M+ satisfies the monotonicity properties \94\j - Let 
(N^^fj, (M fj) be a pair of integer sequences such that (JV A1 ) /Jg N is subexponential. 
cf. Equation ifP5j) . and lim^oo A(iV M , M M ) = 0. 

Then for any pair of integer sequences (tIi/J^n, (m u )ve?i such that lim^^oo n v = oo 
and 

hm^oo < hm 00 — , 

we have lim^^oo A(n u ,m,y) = 0. 

Proof: If we have only the monotonicity property of A to draw upon, the way to 
show that A(n„, m„) — » is to find a suitablc index /i = ^{v) for ali sufhciently largo 
v so that A(7i„,ra„) < A(iV A( ( I/ ), M^r v \), for which we need 

n v > N^iy) and m„ < M M („) . (96) 

The first inequality we will ensure by defining 

= min{a | N a > n u } - 1. (97) 

Then 

N Kv) <n v < N Kv)+l , (98) 

and lim^ ^(v) = oo. Hence it remains to show that the second inequality in 
Equation (|96|l holds for ali sufficiently large v. We consider 



In this product the second factor is < 1 by Equation H98[). and the third converges to 
1 because (N^)^^ is subexponential. Now pick R_,R + such that strict inequalities 
ui M 

lim^ — < R- <R+< Hm^ — (100) 
n v 1\ ^ 

hold. Then for ali sufficiently large v the first factor in Equation l|99|) is < and the 
last factor is < 1 /R + . Hence the product of the first and last factor in Equation (|99|l 
is < < 1. Hence Equation l|96|l holds for ali sufficiently large v. ■ 
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This result covers most sequences (N^)^^, (M^^gpj naturally arising for families 
of codes. In contrast to Proposition 17.21 the result therefore remains useful even for 
the simulation of one noisy channel by another, Le., for the defmition of capacities 
Q(T, S) with non-ideal reference channel <S*. 



7.2. A Counterexample 

From the way in which subexponentiality of enters the proof of Lemma fi . Il 

it is not clear whether this assumption is really necessary. In this section we will 
give an example showing that it cannot be omitted, implying that to establish full 
equivalence we do need the more sophisticated techniques presented in Section 17.31 
The example will also satisfy another naturai constraint on the error function A(n, m), 
which reflects another elementary method of getting new coding schemes from old: we 
can always split the given number of channels into sub-blocks, and apply a known 
coding scheme to each block. The total error is then estimated as the sum of the 
errors of each block. Hence the error function A(ro, m) is subadditive: 

A(ni + n 2 ,m,i + m 2 ) < A(m,mi) + A(n 2 , m 2 ). (101) 

Suppose we are given some codes for a possibly very sparse sequence of block sizes N^, 
with M M = Nfj, coded bits, and — A(iV jU) N^) — > 0. Then the rate 1 is sporadically 
achievable. The error bound we get by the best combination of blocking and possibly 
wasting some resources is then 




A(n, m) := inf { Y] s^ k \ m < £ N^ k < n \ , (102) 

k 



where the infimum is taken over ali admissible sets {-/V Mfc }. This satisfies 
both monotonicity (|94|) and subadditivity (|101fl . Our aim in constructing the 
counterexample is to choose (ÌV a1 ) ai€ n growing sufficiently rapidly, and (e^^g^ 
decreasing sufficiently slowly, so that A(n, m) can be bounded away from zero even 
though m/n gets small. 

We assume that (N^)^^ is superexponentìal in the sense that N^+i/N^ — * oo. 
Of (£/x)/ieN f° r the moment we only require that it decreases monotonically to zero. 
Then the infimum in Equation i|102|) never contains sums arising by breaking up a 
block of size A^ fc into blocks of smaller sizes: in this way one would not only get more 
terms in the sum of e's, but each term would be larger than ep h . Therefore we can 
lower bound A by considering only a decomposition into the largest available blocks. 
For < m < n < N^+i this means 



A(n, m) > e p 



(103) 



where [^J denotes the largest integer < x. Now we choose n M = A^ +1 — 1, 
and close to the geometrie mean: « \/ N^N^+i . Then on the one hand 
m^/n^ f« y/Nfj./Nfj.+i — > 0, because (N^)^^ is superexponentìal. Hence this pair of 
sequences has rate zero. On the other hand, if we only let e M decrease slowly enough 
we can prevent A(n M , m^) from going to zero. For example, with = 
we get A(n M ,m^) > 1/2 asymptotically. 

To summarize: we have constructed a monotone and subadditive function 
A(n, m), for which the rate 1 can be achieved sporadically, but for which the proper 
achievable rate is 0. 
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7.3. Hashing Helps 

We will now explain how hashing helps to establish full equivalence of the one-sequence 
and all-sequence definitions, showing that it is indeed sufficient to check only one pair 
of sequences when testing a given rate R. As we know from the previous subsection, 
this requires that if we have found a fairly good coding for some large block size, we 
must make better use of it than just repeating the blocks, and maybe not using some 
of the input bits. 

This problem is in essence the same that arises when we have a fairly good channel 
to begin with: just repeating it without further encoding will not make errors go to 
zero. Instead they will accumulate. But, on the other hand, a channel which is nearly 
ideal should also have nearly the capacity of an ideal channel, or else the whole idea 
of capacity would make no sense. In fact, in our paper so far we have only shown 
one type of channel to have positive capacity, namely the ideal channels. As shown 
in Section 13.21 in that case the problem of accumulating errors simply does not occur, 
and ali coding is with A(n, m) = 0. 

So it would actually be conceivable that capacities are always zero, unless coding 
can be done without errors (see also Section l2~7jl . However, it can be shown that small 
errors can be corrected with only a small loss in capacity. This problem is treated in 
a self-contained way in |24) . and we refer to that paper for details and proofs. Here 
we only point out the statements needed in the present context, and sketch the main 
arguments. 

The non-trivial family of codes needed for this argument are called hash codes. 
In |24j they are constructed as random graph codes, based on a scheme [50) which 
turns graphs into quantum error correcting codes of the Knill-Laflamme type. The 
verification that a certain number of errors is corrected by such a code amounts to 
showing that a certain system of linear equations is non-singular. Thcn the existence 
of codes with suitable parameters is shown by checking that this condition holds true 
in a generic random graph of suitable size. The random graphs are generated such 
that the probabilities for each edge are independent and equidistributed. This is quite 
different from Shannon's idea of random coding, where the distribution depends on 
the noise in the channel and the input state. The argument based on graph codes 
works in any Hilbert space dimension d which is a prime number. It shows that if we 
want to encode m systems of dimension d into n systems of the same dimension, and 

(™ + ^lW + H 2 (^)<0 (104) 
In n I V n / 

then we can arrange for the code to correct arbitrary errors occurring on up to / of 

the n subsystems. Fiere 

fr a (p) = -pid(p)-(i-p)id(i-p) (105) 

is again the binary entropy function. Moreover, the expression in Equation 1)104)1 is 
an upper bound on the exponential rate at which the probability for a random graph 
code not to correct that many errors decreases. The cruciai feature is that if f/n is 
small, i.e., we do not require many errors to be corrected, then we can get m close to 
n, i.e., the rate of the coding scheme is nearly that of the ideal channel. 

The next step is to convert the correction of rare errors to that of arbitrary small 
errors. Here a straightforward nomi estimate is 

/ /+i\ ra 
\\DT® n E- id m \\ cb < 2^«' +1 )/") ||T-id|| c 7" , (106) 
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when E, D are a code correcting / out of n errors on m input systems, as above, and 
id m denotes the ideal channel on m d-level systems. Then as soon as the expression 
in parentheses is < 1, in particular, if the channel T is close to ideal, we see that the 
errors go to zero exponentially in n. 

We now apply these ideas to a given coding solution for some channel T, Le., we 
assume that for the given channel we have some encoding of a c?-level system through 
N parallel uses of the channel. The nominai rate of this coding scheme, expressed in 
the units "qubits per channel use" isldd/N. For large N we may as well assume that d 
is a prime number, because the gaps between consecutive primes go to zero |51j . Then 
we apply the above ideas to the encoded channel T = D(T® N )E. The overall code 
will require nN channel uses, and the encoded systems are d m dimensionai. If we use 
n as the index of the resulting sequence of channel uses, the resulting sequence of block 
lengths grows linearly, hence is clearly subexponential, and has rate (mld d) / (nN) . 
The errors go to zero exponentially, provided we can find / satisfying Equation (|1C)4|1 
and such that the parenthesis in Equation l|lUtj|) is strictly less than one. Combining 
ali this gives the following estimate (Theorem 8.2 in |24| ) : 

Proposition 7.2 Let T be a channel, not necessarily between systems of the same 
dimension. Let N,d S N with d a prime number, and suppose that there are channels 
E and D encoding and decoding a d-level system through N parallel uses of T, with 
error A = \\DT® N E - id d \\ cb < ±, with e = expl. Then 

Q(T) > ^(1 - 4eA) - ±H 2 (2eA) . (107) 

Moreover, Q(T) is the least upper bound on ali expressions of this for m, and for coding 
rates below the bound the errors decrease exponentially. 

Note that here a single successful coding scheme (E, D) guarantees at least a lower 
bound to the capacity. The most important aspect of this bound is once again that 
the precision A required does not depend on the dimension d. Therefore, even if we 
know such codes only on an arbitrarily thinly spaced sequence of iV's, with vanishing 
errors along this thin subscquence, we can achieve ali rates below the sporadic rate 
(lim^lde^/iVju) by subexponential sequences as well, and hence for any sequence, as 
required by Definition l2.ll Thus the sporadic capacity is equal to the capacity. 

Note that Proposition 17.21 also clarifies the questions brought up in Section 12.71 in- 
deed a requirement that errors should vanish exponentially fast can be met for any 
achievable rate strictly below the capacity. Analogous results have been presented 
very recently by M. Hamada building on earlier work in |53lll3ll*H?| . 

Moreover, it is clear from Proposition 17.21 that tolerating finite errors is possible: 
Since we require the capacity Q e {T) to be achieved for arbitrarily large N, the second 
term in Equation i|107|) also goes to zero, and we get the bound 

Qe(T)>Q(T)>Q £ (T)(l~4ee) . (108) 

Hence lim e ^o Qe(T) — Q(T), as claimed. 
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